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Abstract 


The Tau method based on the Bernoulli polynomials is implemented 
efficiently to approximate the Nash equilibrium of open-loop kind in non- 
linear differential games over a finite time horizon. By this treatment, 
the system of two-point boundary value problems of differential game ex- 
tracted from Pontryagin’s maximum principle is transferred to a system of 
algebraic equations that Newton’s iteration method can be used for solving 
it. Also, for the mentioned approximation by the Bernoulli polynomials, 
the convergence analysis and the error upper bound are discussed. To 
demonstrate the applicably and accuracy of the proposed approach, some 
illustrated examples are presented at the final. 
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1 Introduction 


A differential game is an extension of optimal control theory that describes a 
conflict situation between some players, who seek their maximum or minimum 
own payoffs under a dynamical system. It has arisen in practical problems 
from economics to engineering applications in recent years [29, 30, 44, 11, 15, 
31]. 

One of the most important and crucial solution concepts in game theory is 
the Nash equilibrium, in which players have no incentive to deviate from their 
original plans [23] and is classified into the following two cases in differential 
games based on the information of the state of the game that players know 
at different times of game: 


e The players have no information during the game and only know the 
game state at the initial time. This kind of equilibrium is known as 
open-loop Nash equilibrium. 


e The current game state is known to players. Such equilibrium is often 
called feedback Nash equilibrium. 


The main approaches to computing the open-loop Nash equilibrium in differ- 
ential games are indirect methods and direct methods. In indirect methods, 
the nonzero-sum differential game is reduced to a system of two-point bound- 
ary value problems (TPBVPs) by using the necessary optimality conditions 
of the Pontryagin’s maximum principle that can be solved analytically or nu- 
merically [5]. In direct approaches that are optimization based methods, dif- 
ferential game problem is transferred to mathematical programming [12, 20]. 
However, the drawback of direct approaches is that there is no guarantee 
that the solution obtained is feasible for the original problem [27, 42]. 

Regarding the indirect methods, most researchers have focused on a spe- 
cial kind of differential game, namely linear-quadratic dynamic games that 
the state equation of the game is linear with respect to control and state 
variables, and both are quadratic in the performance indices. For this kind 
of differential game, the systems of TPBVPs are linear in general, and hence 
the open-loop Nash equilibrium can be obtained analytically based on solv- 
ing Riccati equations [14, 17]. Indeed in practice, we face with differential 
games that their systems of TPBVPs are nonlinear generally. Therefore, 
using suitable numerical methods is necessary [35]. 

To the best of our knowledge, there are a few research works carried out 
to compute open-loop Nash differential games in nonlinear case. In [27], a 
pseudospectral method based on Chebyshev polynomials was applied for find- 
ing the players’ open-loop strategies in nonlinear differential games. In [24], 
by Riccati equations, the open-loop Nash equilibrium of differential games in 
polynomial case was obtained. In [9], the coordinate transformation approach 
was extended for computing open-loop Nash equilibrium, and complementar- 
ity theory was applied for a class of zero-sum differential games to be solved 
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in [43]. In [32], a combined quasilinearization method with exponential Bern- 
stein functions was introduced for a numerical solution to TPBVPs. 

There are several methods for solving differential equations numerically, 
such as spectral methods [8], shooting and multiple shooting methods [25], 
variational iteration method [16], and homotopy analysis method [1] that 
each of which has its own implementation. Spectral methods are based on the 
weighted residual method that has high accurate results in solving differential 
equations and are classified into three methods, namely collocation, Galerkin, 
and Tau methods [19, 22, 34, 41]. 

The Tau method is one of the most accurate spectral methods for a nu- 
merical solution to differential equations of different kinds [39, 3, 38, 18]. 
The goal of this paper is to propose an implementation of the Bernoulli Tau 
method (BTM), in which the solution functions are defined by means of a 
truncated Bernoulli series expansion, to compute the open-loop Nash equi- 
librium in nonlinear differential games with finite horizons. 

The remainder of the paper is organized into the following sections. In 
Section 2, the nonzero-sum nonlinear differential games are defined, and the 
extraction of the systems of TPBVPs from Pontryagin’s maximum principle is 
described. In Section 3, the Tau approach based on the Bernoulli polynomials 
is introduced and applied for computing the open-loop strategies of these 
differential games. In Section 4, some numerical examples are presented 
to validate the accuracy and applicability of the present method. Finally, 
conclusions are presented in section 5. 


2 Problem statement 


A family of nonzero-sum nonlinear differential games with finite horizon is 
described in the following definition. 


Definition 1. A nonzero-sum nonlinear differential game is defined as follows 
[6]: 


T 
ieee Ji(ui(-), u-i(-)) = | K;,(t, x(t), ur(t), ue(t),...,Um(t)) dt + ®;(2(T)), 
wil 0 

&(t) = f(t, x(t), ui), u2(t),.--,Um(t)), (1) 
x(0) = 29 ER, 


where u;(t) € U; C R is the player 7’s control (strategy), x(t) € R is the 
state vector of the differential game, and M = {1,2,...,m} is the set of 
players. The functions K;(t, x(t), u(t), ua(t),...,Um(t)) and ®;(#(T)),7 = 
1,2,...,m, are continuously differentiable functions that describe the player 
a’s running payoff and terminal payoff, respectively. The goal of this differ- 
ential game for each player 7, 7 = 1,2,...,m, is to maximize his payoff by 
choosing a suitable strategy u;(t) € U; CR. 
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For differential game (1), the open-loop Nash equilibrium is described as 
follows. 


Definition 2. The control actions u¥(-),7 = 1,2,...,m, are considered as a 
Nash equilibrium for differential game (1), if the following inequalities hold: 
T(uz(-), uA) > Si(uil), ue ,()), — for all uj € Ui, 
where wu; is the 7th player’s strategy and u_; state the other players’ strategies, 

that is, U4 = UjsJ x is 


For deriving the first-order optimality necessary conditions of nonlinear 
differential game (1) and characterizing an open-loop strategy, the Hamilto- 
nian functions are defined as follows: 


(t,x, uj, ui, Ax) = Ki(t, x, uz, u-i) + AG f(t, , Ui, U-a), 4=1,2,...,m, 


where the variables \;, 1 = 1,2,...,m, are the adjoint functions. 

Pontryagin’s maximum principle provides a set of optimality conditions 
for control actions to construct an open-loop strategy in nonlinear differential 
game (1) as follows: 


a(t) = f(t, a(t), ui(£),...,Um(t)), «(0) = 20, (2) 
(6) =A, alt) mil) ato), A(T) = SEE) Gg) 
a (t, x(t), u(t), u_i(t), Ax(#)) =0, *@=1,2,...,m. (4) 
Aaesepeesion te wa) sa LOS 2 et a eapece to a) BEd OR ean 


be obtained by solving the algebraic equations (4) as follows: 


This expression is replaced in (2) and (3) to obtain the system of TPBVPs 
based on x(t) and \;(¢) , 2 = 1,2,...,m, as follows: 


x(t) = pO V1 (t), Wo(t),-..,Um(é)), 
ri(t) is oe (t, x(t), Vi (t), W_i(t), Ai(4)), 
x(0) = 20, 


s(t) = CAD, 


where WU; = W;(t, x(t), A; (t)), i= 1,2,...,m. 

This system of differential equations with split boundary conditions is 
nonlinear generally, which makes it difficult or impossible to be solved ana- 
lytically. Therefore, using an appropriate numerical approach is required. 
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3 The Bernoulli Tau method for nonlinear differential 
games 


In this part, an efficient formulation of the Tau method for a numerical 
solution to the system of TPBVPs is established by obtaining the open-loop 
strategy of nonlinear differential game (1). 

The Tau method is a highly accurate spectral method for differential 
equations to be solved numerically. Implementing this method is based on 
expanding the solution functions f(t) of differential equations in terms of suit- 
able basis polynomials such as Bernoulli [40, 33], Jacobi [4, 28], and Bernstein 
polynomials [21] as follows: 


fe=) Ae, 
i=0 


where f; and P;(t), i = 0,1,2,..., are unknown coefficients and basis poly- 
nomials, respectively [7]. 

In practice, we use only a finite number of these basis polynomials, mean- 
ing that f"(t) = 05 fi P(t) is a numerical approximation of the exact 
solution f(t). 

In this paper, the Bernoulli polynomials are considered as basis polynomi- 
als, in which the definition and properties of these polynomials in a function 
approximation are stated below. 


Definition 3. Bernoulli polynomials of order n are defined on [0,1] by (see 


(26) ; 
n i 
Bry (t) = os ()anait ; 
i=0 
where a;, 7 =0,1,...,n, are Bernoulli numbers and defined as 
t a 
e_—1_ »» “ i! 
i=0 
The first few Bernoulli numbers are 
1 —1 1 —1 
aj = Qa= — ag = “a4 = 5 
0 , 1 a” 2=& 4= 39" 


with A2i+1 = 0, for 1 = 1, 2,3, sees 


The first seven Bernoulli polynomials are 


1 1 3 1 
Bolt) = 1, Ait) =t—5, Bo(t)=t? —t+ 6° Ba(t) = t° — St + ot, 
1 5 5 1 
th=af- w+ e- — tha — -t44 = - <2 
Bal ) + 30° Bs( ) 2 + 3 6 ry 
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5 
t) = ¢t® — 30? + <t* — =t 
Be(t) 3 +5 5b + 
A complete basis is formed by these polynomials over the interval [0, 1]. 


Any function f(t) belonging to L?[0, 1] can be approximated by Bernoulli 


functions as follows: m 
f(t) = f(t) = SS) fbi(t) 
i=0 


To apply the Tau method based on the Bernoulli polynomials for solving 
the system of TPBVPs (5), for simplification matters and without loss of 
generality, we consider T = 1, and then the unknown functions x(t) and 
Ai(t), i = 1,...,m, can be approximated as finite expansions of Bernoulli 
polynomials as follows: 


a(t) © x" (t) = ye a;;(t) = A? B(t) 
ri (t) © AP(t = Lh H10 = BI g(t), i=1,2,...,m, 


where A? = [ag,a1,...,@n] and BY = [bio, bi,.--, bin], i=1,...,m, are the 
vectors of unknown costlicients and B(t) = [Bo(t), Pi(t),---,;Bn(d)]” is the 
vector of Bernoulli polynomials. 


The residual functions are defined by substituting these expansions in the 
differential equations of the system of TPBVPs (5) as follows: 


Ro(t) = £"(t) — f(t, 2"(t), VP (t), Uo (e),..-, Vn (t)), 
. OH : 

Rit) = AP) + (2), WO, WD), Tym 
Then, multiplying these residuals by §;(t), 7 = 0,1,...,n — 1, integrating 
over the interval [0, 1], and setting equal to zero, together with the boundary 
values, the following system of (m+ 1)(n +1) algebraic equations is created, 
which Newton’s iteration method can be applied to solve it and to determine 
the unknown vectors A? and B?,i=1,...,m: 


0 
R;(t)B;(t)dt =0 

£"(0) = Zo, 

ag(y = SRLA*D) 
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By the following theorem, the convergence analysis and the error upper bound 
for the mentioned approximation obtained by the Bernoulli polynomials is 
discussed. 


Theorem 1. Suppose that x(t) and \;(t),i =1,...,m, belong to C"*(0, 1] 
and that S, = span{Go(t), Gi(t),..-, Bn(t)}. If AT B(t) € Sp and BF A(t) € 
Sp ,t = 1,2,...,m, are the best approximations of x(t) and \,(t),i = 
1,...,m, respectively, then 


C 
HSA Oe a 
I|x(¢) B(E)IIx2[0,4) S an ones 
and 
Ome Oe ——— nnn rne eee 
> @eDh/In +3 
where C = max |a'"*)(¢)| and C; = max JAlrtP (ey), t= 1,2, .0.5™m, 
te [0,1] te [0,1] 


Proof. The proof will be done for the first inequality and the other inequalities 
can be proved in a similar manner. 

Since x(t) € C™*+[0, 1], there exists C € N such that for every t € [0,1], 
we have |x*(t)| < C,k =0,1,...,n +1, and x(t) can be expanded into the 
Taylor formula as 


eNO BOT) eka 


x(t) = tk 4 ia ines: 


= &(t) + 


oe (n+ 1)! (n+ 1)! , 
“ (*) (0) 
where Z(t) = S- pr and € € [0,¢]. Hence, we have 
i=0 ; 
7 g (n+) € as 
a(t) — a(t)| = Dt, 


(n+ 1)! 


Because A? G(t) is the best approximation of x(t) out of S,, Z(t) € Sp, and 
considering the above equality, it is concluded that 


pnt) : 
le@®) - AP B@)II22I0.4 < le® - #Ollz290.) = ae i 


(lessens) 
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girtl) (€) yet 
(n+ 1)! 
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_ C 
(n+ 1!h/2n43’ 


and this completes the proof. 


Remark 1. It should be noted that for practical use of Bernoulli polynomi- 
als on the interval [a, b], it is necessary to shift the defining domain by the 
following variable substitution and construct the shifted Bernoulli polynomi- 
als: 


4 Numerical illustrations 


In this part, three differential game problems are presented to illustrate the 
accuracy and efficiency of the proposed approach. Example 1 is a linear- 
quadratic differential game that its exact solution can be obtained. By this 
example and comparing it with the exact solution, we can verify and validate 
the proposed approach. Example 2 is also a linear quadratic differential game 
with attainable exact solution. In this example, we compare the results of 
the proposed method with the Chebyshev pseudospectral method (CPM) 
presented in [27]. Example 3 is a differential game arising from an economic 
model with a nonlinear system of TPBVPs that the exact solution is not 
available. To check the performance of the proposed method for this problem, 
a residual function is defined. 

All the computations associated with the proposed method have been 
performed by Maple 17 software with 32 digits precision on a Core (TM) i7 
PC with 2.70GHz of CPU and 16GB of RAM. 


Example 1. For this differential game, the state equation is [13] 
&(t) = uy(t) + u(t), (0) =1, 
and two players’ performance indices are 
1 
min J; = | (—x?(t) + uf (t))dt, 
0 
1 
ori ee | (2x2 (t) + u2(t))dt + 22(1). 
0 


The exact open-loop Nash equilibrium of this differential game is [13] 
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Hence, the exact values of players’ performance indices are 


J} = —0.32975303263305, 
Jz = 1.9344880850240. 


The system of TPBVPs for the mentioned game is stated as 


a) = - SY, 
A1(t) = 22(t), 
A2(t) = —4a(t), 
x(0) =1, 
Ai(1) = 0, Ag(1) = 22(1) 
The values of performance indices obtained by the proposed approach and 
the comparison of the analytical solutions are shown in Table 1. Also, the 


approximate solutions and the exact solutions with n = 10 together with 
absolute errors are plotted in Figure 1. 


Table 1: Comparison of optimal payoff functionals J; and J2 obtained by 
BTM with the exact solutions and also the CPU time(s) for Example 1. 


n JiprM JoprM \Iierm — Jt|  |Jopru — Jz|_ CPU time(s) 
4  —0.32975302954236861650  1.93448814833633875533 3.09 x 10-9 5.23 x 10-8 0.124 
6  —0.32975303263303305145  1.93448808502434431993 1.35 x 10-14 2.27 x 10-8 0.156 
8  —0.32975303263304656749  1.93448808502406878964 1.69 x 10~?° 2.84 x 10-19 0.218 
10 —0.32975303263304656750 — 1.93448808502406878929 8.17 x 10727 1.37 x 10775 0.328 


Example 2. For this differential game, the state equation is [13] 
“(t) = 2a(t) + ui(t)+ue(t), «(0)=1, 


and two players’ performance indices are 


= doh 
ut e sty ie _ 
e3 
_: Dh es 
us = de 3t e 2t 
e3 


Hence, the exact values of players’ performance indices are 


J* = 0.3140381912, 
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1.0 
1.x 1074 
09 
8.x 107 
08 
(rt) 97 6.x 107% 
x(t 
() Ex"(t) 
0.6 
4.x 19735 
05 | 
2.x 1071 
04 
0 


02 04 0.6 08 1 P 
t 
0 02 0.4 0.6 0.8 1 
——Exact_* BTM t 
0.6 1.x 10°!4 
05 
8.x 107 
04 
u(t) 95 epay O10 
1 3 Eu,*(t) 
02 4. x 10°15 
Ol | 
2.x 10°18 
0 
0 02 04 06 08 1 ‘ 
1 
0 a2 04 0.6 0.8 1 
——Exact_* BTM t 
t 
0 02 04 06 08 1 2.x 10°14 
-0.4 
-0.6 151074 
-0.8 
Eu,"(t) 14 
ut) 1 1.x 10 
4 | 
5.x 10°! 
-14 
-16 0 
0 0.2 04 06 08 1 
——Exact_* BIM 


Figure 1: Plots of the approximate solutions and the analytical solutions 
together with absolute errors with n = 10 for Example 1. 


Jz = 3.4136123279. 
The system of TPBVPs for the mentioned game is stated as 


a(t) = 2n(¢) — 8 _ al) 


Ai(t) = —22(t) — 2d1 (8), 


Ag(t) = —8a(t) — 2A9(t), 
x(0) = 1, 
Ai (3) = 0, A2(3) = 102:(3). 
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The values of performance indices obtained by the proposed approach and 
the comparison of the analytical solutions are shown in Table 2. Besides, to 
compare the BTM results with an existing approach, the results obtained by 
the CPM [27] are shown in Table 3. 


Table 2: Comparison of optimal payoff functionals J; and J2 obtained by 
BTM with the exact solutions and also the CPU time(s) for Example 2. 


n Jiprm J2BTM lIiprm —J{| |Josrm — J3| CPU time(s) 
10 0.31403763402282 3.41361478021289 5.57 x 10-7 2.45 x 10-5 0.437 
15 0.31403819123820 3.41361232797387 1.07 x 10-14 4.58 x 10714 0.640 
20 0.31403819123819 3.41361232797391 8.77 x 10-74 3.85 x 10728 1.154 


Table 3: Comparison of optimal payoff functionals J; and J2 obtained by 
CPM with the exact solutions for Example 2. 
n JicopM J20PM licpm —Jt| |Joopm — J3| 
10 0.3140689582 3.4134809955 0.0000307670 0.0001313324 
15 0.3140381906 3.4136123306 6.1 x 10710 OTe 10-? 
20 0.3140381912 3.4136123279 5.2 x 10-1! 223 i 


Table 2 indicates that in the same situation in terms of the number of basis 
functions, the results obtained by the proposed method are more accurate 
than the results obtained by the CPM in this example. 


Example 3. The following differential game describes the competition be- 
tween two players in an effort for harvesting a natural renewable resource. 
The state equation of this game is expressed as 


£(t) = 0.1x(t) — 0.001x?(t) — x(t)ur(t) — x(t)ue(t), 2(0) = 1. 


The players’ payoffs are given by 
: 1 
Ju(uysua) = ff (Ba(t)ua(t) — 5u¥(t))at 
0 


Javan) =f Qe(thua(t) ~ 5uB())at 


where the value x(t) > 0 is the resource level and the amounts u;(t) > 0 and 
ug(t) > 0 are the players’ efforts for harvesting this resource, all at time t. 


Moreover, att and ne indicate the costs for harvesting at effort levels u, 
and ug, respectively [9]. 


Remark 2 (see [35]). By the linearity of the state equation of this differential 
game with respect to the control variables u;, i = 1,2, and the concavity of 
integrand of performance index J;,i = 1,2, with respect to uj,7 = 1,2, 
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Jj 
Ou? 
exists and is unique for this dynamic game regrading the Filippov—Cesari 
existence theorem [10]. 

The nonlinear system of TPBVPs extracted from Pontryagin’s maximum 
principle for this differential game is stated as follows: 


(because = —1 < 0,7 = 1,2), it yields that the open-loop strategy 


& = 0.lx — 5.001a? + 7A, + 2? A0, 
Mi = —9r — O.1A1 + 8.0022, = xrt = tr,o2, 


Ag = —4xa — 0.12 + 7.002272 — are — Aro, 
x(0) = 1, 
Lio; Ho 0) 


The numerical results for various amounts of n are presented in Table 4. It 
is worth mentioning that since the exact solution to this differential game is 
not available, to check the accuracy and validity of the proposed method for 
the differential game under consideration, the error of residuals is defined as 
follows: ‘ 


|| Res||? = (Ri(t) + R3(t) + R3(t))dt, 
0 
where R;(t),7 = 1, 2,3, are the residuals defined in the previous section. 


Table 4: Optimal payoff functionals J; and Jy for Example 3 with error 
norms and also the CPU time(s). 


n JiBTM J2BTM |Res||? CPU time(s) 
6  0.946161437829  0.452174552034 3.74 x 10-> 3.931 
8 0.946161294373 0.452174505059 5.44 1077 12.683 
10 0.946161293220 0.452174504702 7.27 x 1079 35.334 


12 0.946161293210 0.452174504699 9.16 x 1071! 92.486 


It is notable that due to the nonlinearity of the system of TPBVPs for this 
example and also the process of implementing the Tau method, we expect 
that it consumes more time than the previous examples to be solved. Table 
4 verifies this matter. 


5 Conclusions 


In this paper, a formulation of the Bernoulli Tau method (BTM) was es- 
tablished efficiently for approximating the open-loop Nash equilibrium in 
nonlinear differential games over a finite time horizon. Using this approach, 
the system of TPBVPs extracted from Pontryagin’s maximum principle was 
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reduced to a system of algebraic equations by expanding the solution func- 
tions in terms of Bernoulli polynomials, which can be solved numerically to 
determine the unknown coefficients. At last, three examples were presented 
and solved by this approach to validate the applicably and accuracy of the 
present method. The approximate solutions were obtained with an excellent 
agreement with the exact solutions. 
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